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Approximate  Solutions  for  Certain  Optimal  Stopping  Problems 

A.  John  Petkau 

1 . INTRODUCTION 
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and 


s*  = (o~2  + to'2)”1 


2 -2  -2  -1 

Here  s*  varies  from  s*  * oQ  to  s*  - (oQ  + No  ) 


(1.2) 
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The  following  optimal  stopping  problem  (which  is  one  of  several 

different  problems  which  have  come  to  be  known  as  the  one-armed  bandit 

problem)  has  arisen  both  in  the  study  of  a sequential  sampling  inspection 

plan  by  Chernoff  and  Ray  (1965)  and  in  the  study  of  a sequential  model 

for  clinical  trials  by  Chernoff  (1967)  as  well  as  in  connection  with 

problems  involving  the  sequential  estimation  of  the  common  mean  of  two 

normal  populations  considered  by  Mallik  (1971)  and  Chernoff  (1971) : 

Let  X(t)  be  a Wiener  process  described  by  E[dX(t)J  = pdt  and 
2 2 

Var[dX(t)]  = o dt  where  a is  presumed  known.  One  is  permitted  to 

stop  observing  the  process  X(t)  at  any  time  t,  0 <_  t <_  N , and  receive 

a payoff  X(t) . The  unknown  parameter  p is  assumed  to  have  a 
2 

N (Wq '°o^  Prior-  What  is  the  optimal  stopping  procedure? 

It  is  easy  to  verify  that  the  posterior  distribution  of  p given 
X(t’ ) , 0 < t’  < t,  is  N (Y*  (s*) ,s*)  where  . 


. Furthermore 


the  process  Y*(s*)  is  a Wiener  process  (in  the  -s*  scale)  described 

by  E[dY*(s*) ] = 0 and  Var[dY*(s*)]  = -ds*,  starting  from  Y*(s*}  = . 

The  loss  upon  stopping  at  (Y*(s*),s*)  is  -X(t)  which  from  (1.1)  is 

a linear  function  of  -Y*(s*)/s*  . Applying  the  transformation  s = s*/s*  , 
1/2 

Y(s)  = Y*(s*)/s*  leads  to  a normalized  version  of  this  stopping  problem 

2 -2  -2 

where  s varies  from  sQ  = + No  ) to  s^  = 1 and  in  which  the 

stopping  cost  is  given  by  d(y,s)  defined  by 


d(y,s)  = -y/s  (1.3) 

for  s >_  1 with  stopping  enforced  at  s = 1 . 


This  normalized  problem  is  a special  case  of  the  following  optimal 
stopping  problem:  Given  a Wiener  process  (Y(s),  s >_  s^}  in  the  -s 
scale  described  by  E[dY(s)]  = 0 and  Var[dY(s)]  = -ds  and  starting  at 
Y(S())  * yQ  , find  the  stopping  time  S to  minimize  E[d(Y(S),S)]  . If 
we  define  P(Y0*sq)  = inf  b^y0»s0)  where  b(yQfs0)  is  the  risk  associated 
with  a particular  stopping  time  and  the  infimum  is  taken  over  all  such 
stopping  times,  p(y,s)  represents  the  best  that  can  be  achieved  once 
(y,s)  has  been  reached,  irrespective  of  how  it  was  reached.  An  optimal 
procedure  is  then  described  by  the  continuation  set  C * {(y,s):  p(y,s)  < d(y,s)}. 
Chernoff  (1968)  has  demonstrated  that  one  should  expect  the  solution  (p,C) 
of  the  stopping  problem  to  be  a solution  of  the  following  free  boundary 
problem  (3C  denotes  the  boundary  of  the  set  C) : 


3- 


Furthermore,  for  any  such  stopping  problem.  Van  Moerbeke  (1974)  has  shown 

that  one  should  never  stop  at  points  (y,s)  at  which  i d (y,s)-d  (y,s)  < 0. 

c.  yy  s 

Applying  this  criterion  to  the  normalized  version  of  the  stopping 
problem  described  above,  hereinafter  referred  to  as  the  one-armed  bandit 
problem,  one  finds  that  {(y,s):  y > 0,  s > 1}  must  be  a subset  of  the 
optimal  continuation  region  C . Chernoff  and  Ray  (1965)  have  shown  that 
for  this  problem  C can  be  described  as  C = {(y,s):  y > y(s),  s > 1} 
and  have  determined  asymptotic  expansions  for  the  boundary  curve  y(s) 
in  the  regions  of  large  s and  s close  to  1.  The  leading  terms  in 
these  expansions  are  given  by 


y(s)  = -(2s  log  s) 


1/2 


y (s)  = -0.64 (s-1) 


1/2 


as  s -*■  “ 


as  s 1 


The  scale  z = y/s  and  t = 1/s  is  more  appropriate  for  applications 
and  these  expansions  are  illustrated  in  this  scale  in  Chernoff  and  Ray 
(1965) . These  expansions  also  appear  as  the  curves  zQ  and  z^  in 
Figure  c of  this  paper.  It  is  evident  from  this  illustration  that  these 
asymptotic  expansions  are  inadequate  as  a complete  description  of  the 
optimal  continuation  region.  An  approximation  to  the  optimal  continuation 
region  is  required  as  a description  of  the  optimal  procedure  in  the  region 
where  the  asymptotic  expansions  are  clearly  inadequate. 

Although  it  is  possible  that  refined  methods  of  asymptotic  analysis 
could  lead  to  expansions  which  would  provide  an  adequate  description  of 
the  optimal  procedure,  the  purpose  of  the  present  paper  is  to  describe 
simple  methods  which  lead  to  arbitrarily  accurate  numerical  approximations 
to  the  optimal  continuation  region  for  the  one-armed  bandit  problem.  Although 


- 4 - 


most  of  the  discussion  in  the  present  paper  will  concentrate  on  the  one- 
armed  bandit  problem,  these  same  methods  could  be  applied  with  equal 
facility  to  any  optimal  stopping  problem  of  the  general  form  described 
above. 

2 . APPROXIMATE  SOLUTIONS 

In  this  section  we  indicate  how  approximate  solutions  to  the  one- 
armed  bandit  problem  can  be  obtained  by  replacing  the  problem  for  the 
Wiener  process  Y(s)  by  am  analogous  problem  for  a discrete-time 
discrete-step  process  which  we  will  denote  by  Y'(s')  . 

Consider  the  process  Y'(s')  which  starts  at  Y’ (l+n*A)  = y*  and 

1/2  1 

xs  defined  by  Y’ (s’-A)  = Y'(s')  ± A each  with  probability  - . This 
process  is  observed  for  at  most  n successive  times  and  the  cost  associated 
with  stopping  the  process  at  any  point  (y',s')  is  given  by  d(y',s’) 
defined  by  (1.3).  The  problem  is  to  find  a stopping  time  to  minimize  the 
expected  loss.  We  shall  denote  the  optimal  expected  loss  by  p'ly'.s'). 

For  this  problem,  a backward  induction  algorithm  becomes 

p'(y*,l+n*A)  = min{d(y* ,l+n*A) , zip’ (y'+A1^2,l+(n-l) *A) 

(2.1) 

+ p' (y *-A1/2,l+ (n-1) • A) ] } 

for  n > 1 with  P ' (y ' ,1)  ■ d(y* ,1)  . Using  the  methods  of  Chernoff  and 
Petkau  (1976) , it  is  easy  to  verify  that  for  this  problem  the  optimal 
stopping  set  can  be  described  as  {(y',l+n*A):  y*  £ yn(A) , n ^ 1}  where 
for  each  fixed  value  of  A , {y  (A);  n=l,2,...}  is  a non-increasing 


non-positive  sequence.  Note  that  this  set  does  not  depend  upon  the 
initial  point.  Further  note  that  direct  application  of  (2.1)  yields 

yx(A)  = o . 

Since  Y'(s')  is  a process  of  independent  increments  with  mean  zero 
and  variance  one  per  unit  change  in  -s',  any  stopping  problem  for  the 
Wiener  process  Y(s)  of  the  previous  section  can  be  imitated  by  the  use 
of  a small  value  of  A in  the  Y'(s')  process.  As  A approaches  zero, 
the  solution  of  the  analogous  discrete  problem  would  be  expected  to 
converge  to  the  solution  of  the  Wiener  process  problem.  In  particular, 
for  the  one-armed  bandit  problem  this  leads  to  the  initial  approximation 

y(l+n*A)  * yn(A)  (2.2) 

where  y(l+n*A)  denotes  the  optimal  boundary  for  the  one-armed  bandit 
problem  evaluated  at  s = l+n*A  . 

It  remains  to  evaluate  the  sequence  (y^tA)}  . Consider  the  Y'(s') 

process  defined  as  above  on  the  grid  of  points  {{y',s'):  s'  = l+n*A, 

1/2 

y'  = c + k*A  ; n = 0,1,2, ... ,k=0,  ± 1,  ±2,...}  . Note  that  the  grid 

is  completely  specified  by  the  parameter  c (for  convenience,  assume 
1/2 

0 c < A ).  Examination  of  (2.1)  with  the  particular  form  of  d(y,s) 

given  in  (1.3)  reveals  that  for  any  given  choice  of  A,  if  the  points 

{(y',l+n‘A):  y'  <_  y*}  are  stopping  points  then  so  are  the  points 

{ (y' ,l+(n+l) *A) : y'  <_  y*-A  } . This  observation,  together  with  the 

fact  that  the  sequence  (y  (A)}  is  non-increasing,  implies  that  when 

n 

using  the  backward  induction  algorithm  (2.1)  to  classify  the  grid  points 
as  either  stopping  or  continuation  points,  the  comparisons  implied  by 
(2.1)  need  be  carried  out  at  only  a single  value  of  y'  for  each  fixed 
value  of  s'  . The  algorithm  (2.1)  can  now  be  easily  implemented  in  a 
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direct  fashion. 

Due  to  the  special  nature  of  the  one-armed  bandit  problem,  namely 

the  fact  that  all  points  (y,s)  with  y‘>  0 and  s > 1 are  continuation 

points,  one  might  expect  to  be  able  to  improve  somewhat  upon  the  naive 

approach  outlined  above.  Consider  a particular  path  of  the  Y'(s') 

process  originating  at  the  point  (y'.s1)  = (c+2*A  ' , l+n*A)  . The 

path  of  the  Y'(s')  process  could  hit  the  line  y'  = c+A1^2  for  the 

first  time  at  s'  = l+(n-l) ‘A,l+(n-3) *A,;  • • . Alternately,  the  path 

1/2 

could  remain  above  the  line  y'  = c+A  all  the  way  to  s'  = 1 . 

1/2 

Noting  that  the  points  (c+A  , s')  are  continuation  points  for  all 
s'  > 1 (since  y^tA)  = 0 and  the  sequence  y^fA)  is  non-increasing) 
leads  to  the  relation 


p'(c+2*A1  2,l+n*A)  = \ p p' (c+A^'/,2,l+(n-m)  *A) 


ii'  i 

+ i % kd 

k=l  ' 


(c+ (k+1) *A  ' ,1) 


where  pm  is  the  probability  that  an  ordinary  random  walk  starting  at 
0 first  passes  through  1 at  time  m and  qn  ^ is  the  probability 
that  an  ordinary  random  walk  starting  at  0 stays  above  -1  until  time 


n and  achieves  level  k-1  at  time  n . From  Feller  (1968,  p.  89, 
Theorem  2)  one  finds 


Kv)!' 


for  m even  , 


for  m odd 


The  relation  (2.3)  provides  a modified  method  of  carrying  out  the 

backward  induction  which  we  shall  call  the  truncation  method:  At  s'  = 1, 

the  risks  are  specified  by  d(y,s)  . At  any  stage  s’  = l+n*A  , compute 

1/2 

the  risk  at  y'  = c+2*A  by  means  of  (2.3).  The  risks  at  the  levels 
1/2 

y’  = c+k*A  for  k = 1,0, -1,-2, . . . are  computed  using  the  algorithm 
(2.1)  in  the  fashion  described  above. 

Returning  for  a moment  to  the  one-armed  bandit  problem,  it  is  well- 
known  (and  obvious  from  (1.4))  that  changing  the  stopping  cost  function 
d(y,s)  by  adding  to  it  any  solution  of  the  heat  equation  leaves  the 
optimal  continuation  region  unchanged.  For  the  present  purposes,  it  is 
convenient  to  consider  the  new  stopping  cost  function  dg(y,s)  defined 
by  d0(Y»s)  ■ d(y,s)  + y,  that  is, 

dQ(y,s)  = y - y/s  (2.6) 

for  s 1 with  stopping  enforced  at  s = 1 . Note  that  dQ(y,l)  = 0 . 
Denoting  the  corresponding  optimal  risk  by  Pg(y,s)  , the  algorithm  (2.1) 
becomes 

! i 


Pq(y'  ,l+n#A)  = min{d0(y'  ,l+n*A) , |[p^(y’+A1//2,l+(n-l)  *A) 

(2.7) 

+ p^(y"-A1/2,l+(n-l)-A)]} 

for  n > 1 with  Pg(y',l)  = 0 . Further,  the  relation  (2.3)  becomes 

p ' (c+2*A1>/2,l+n,A)  = T p *p ' (c+A1//2,l+(n-m)  *A)  . (2.8) 

u . m o 

m=l 

This  reduces  the  computation  involved  in  carrying  out  the  truncation 
method. 

To  this  point  we  have  simply  described  the  direct  and  truncation 
methods  of  carrying  out  the  backward  induction  algorithm  for  the  Y'(s') 
process  when  the  motion  of  the  process  is  restricted  to  a grid  specified 
by  a particular  value  of  the  parameter  c . 

It  should  be  emphasized  that  carrying  out  the  backward  induction 
algorithm  for  a particular  grid  simply  classifies  all  the  grid  points 
as  either  stopping  or  continuation  points.  The  sequence  {y^ ( A) } itself 
is  not  determined.  At  each  fixed  value  of  s'  = l+n*A  , the  algorithm 
simply  determines  the  two  adjacent  grid  points  between  which  the  number 
y^(A)  must  lie;  that  is,  the  algorithm  determines  the  value  of 
k {=  k(n,A,c))  such  that 

c + k*A1/2  < y (A)  < c ♦ (k+l)*A1/2  . 

- n 


However,  implementing  the  algorithm  for  different  grids  all  with  the 


parameter  c between  0 and  A 


determined  to  within  any  desired  degree  of  accuracy.  This  will  be 
indicated  in  more  detail  in  the  next  section  where  the  appropriate 
computations  for  the  one-armed  bandit  problem  are  described. 


To  this  point  we  have  presented  simple  methods  of  obtaining 
heuristic  initial  approximations  to  the  solutions  of  optimal  stopping 
problems  for  a zero  drift  standard  Wiener  process.  These  methods  involve 
replacing  the  Wiener  process  problem  by  an  analogous  discrete  problem 
involving  dichotomous  random  variables.  The  relation  of  the  solution 
of  any  such  Wiener  process  problem  to  the  solutions  of  an  entire  class  of 
analogous  discrete  problems  is  considered  in  Chernoff  and  Petkau  (1976). 

A particular  result  of  that  paper  is  the  following  simple  approximate 
relation  between  the  optimal  boundary  of  any  such  Wiener  process  problem 
and  the  optimal  boundary  of  the  corresponding  analogous  discrete  problem 
for  the  Y'(s')  process  described  in  the  previous  section: 

y (1+n* A)  = yn(A)  + 0.5  A1^2  (2.9) 

(the  sign  being  determined  so  as  to  make  the  continuation  region  for  the 
Wiener  process  problem  larger).  For  the  one-armed  bandit  problem,  this 
leads  to  the  following  refinement  of  (2.2) 


y (l+n*A)  = y (A)  - 0.5  A1</2  . (2.10) 

n 
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This  refinement  takes  the  form  of  a "correction  for  continuity" , 
the  solution  of  the  analogous  discrete  problem  being  corrected  in  order 
to  obtain  (approximately)  the  solution  of  the  Wiener  process  problem. 
The  behavior  of  the  approximations  (2.2)  and  (2.10)  will  be  examined 
in  the  next  section. 


3.  APPLICATIONS 


To  illustrate  the  accuracy  of  these  approximations,  it  would  be 
desirable  to  evaluate  these  approximations  in  a Wiener  process  problem 
for  which  the  exact  solution  is  known.  A normalized  version  of  a 
gambling  problem  discussed  in  Van  Moerbeke  (1974)  is  the  following: 

Suppose  (X(s);  0 < s <_  1}  is  a zero  drift  standard  Wiener  process  and 
that  a gambler  wins  money  at  a constant  rate  whenever  the  process 
occupies  the  positive  x half-plane  and  loses  money  at  the  same 
constant  rate  whenever  the  process  occupies  the  negative  x half-plane. 

If  the  gambler  is  permitted  to  stop  the  process  at  any  time  s , 0 < s 1, 
what  is  the  gambler's  optimal  strategy? 

This  problem  can  be  formulated  as  an  optimal  stopping  problem  by 
defining  the  reward  g(x,s)  for  stopping  the  process  at  the  point  (x,s) 
to  be 

2 

g(x,s)  = 1 - s + 2x  for  x ^ 0 , 

= 1 - s for  x <_  0 , 

the  problem  being  to  find  a stopping  time  that  maximizes  the  expected 
reward.  Van  Moerbeke  (1974)  proves  that  the  optimal  continuation  region 
for  this  problem  can  be  described  as  { (x,s)  : x > x(s),  0 < s <_  1} 


_ n _ 


where  x(s)  * -a(l-s)  and  a is  the  solution  of  the  simple  equation 

-OO  2 

a • J exp[Aa-A  /2]dX  = 1 which  can  easily  be  determined  to  be 
0 

a = 0.5061.  Modifying  the  reward  function  to  be  g(x,s)  = g(x,s)-2[x2+l-s] 
does  not  change  the  solution  of  this  problem  and  it  is  easily  seen  that 
the  methods  of  the  previous  section  are  directly  applicable  (in  particular, 
g(x,l)  = 0 for  x ^ 0)  . 

This  Wiener  process  problem  has  been  approximated  by  three  different 

analogous  discrete  problems,  those  corresponding  to  A = 0.01,  0.0025 

and  0.000625.  For  each  fixed  value  of  A the  computations  were  carried 

out  for  all  grids  specified  by  values  of  the  parameter  c varying  from 

1/2 

0 to  A in  steps  of  0.001  . Thus  each  individual  member  of  each  of 

three  sequences  {x  (A)}  is  located  to  within  an  error  of  0.001.  In 
n 

~ - 1/2 
addition  the  corrected  sequences  (x*(A)}  defined  by  x* (A)  = x (A)-0.5A 

n n n 

were  evaluated.  These  six  approximating  sequences  and  the  exact  solution 

x are  illustrated  in  Figure  A in  the  (x,s)  scale.  Here  x,  = (x  (0.01)}, 

1 n 

x = {x  (0.0025)},  x = (x  (0.000625)}  and  similarly  x*  = {xMO.Ol)}  , 
z n J n In 

x*  = {x* (0.0025) } , x*  = {x* (0.000625) } . This  figure  clearly  illustrates 

that  for  this  particular  problem  whereas  the  approximations  provided  by 

x^,  x2  and  x^  are  quite  crude,  the  approximations  provided  by  x*,  x* 


1'  2 


and  x*  , particularly  both  x*  and  x*  , are  exceptionally  accurate , 
being  virtually  indistinguishable  from  each  other  and  from  the  exact 
solution.  The  fact  that  x*  is  not  too  accurate  reflects  the  fact  that 
when  using  these  approximations,  one  must  begin  with  a reasonably  small 


value  of  A . Indeed  if  we  note  that  xj  results  from  correcting  the 
boundary  x^  which  is  obtained  by  approximating  the  Wiener  process  on  the 


interval  [0,l]  by  a simple  random  walk  involving  only  100  time  steps, 
it  is  somewhat  amazing  that  x*  achieves  the  accuracy  that  it  does. 

Figure  A 

The  exceptional  performance  of  the  refined  approximation  (2.9)  in 

Van  Moerbeke's  problem  leads  us  to  hope  that  the  same  type  of  behavior 

will  occur  in  the  one-armed  bandit  problem.  In  order  to  examine  this 

possibility,  the  one-armed  bandit  problem  was  approximated  by  three 

different  analogous  discrete  problems,  those  corresponding  to  A = 1.0, 

0.25  and  0.0625.  For  each  fixed  value  of  A the  computations  were 

carried  out  for  the  region  1 < s < 100  and  for  all  grids  specified  by 

1/2 

values  of  the  parameter  c varying  from  0 to  A in  steps  of  0.01. 

Thus  each  individual  member  of  each  of  the  three  sequences  (y^(A)}  is 

located  to  within  an  error  of  0.01.  In  addition  the  corrected  sequences 

{y*(A)}  defined  by  y*(A)  = y (A)-0.5A  ' were  evaluated.  These  six 
n n n 

approximating  sequences  are  illustrated  in  Figure  B.  Here  y^  = {yn(1.0)}, 

y_  = {y  (0.25)},  y = {y  (0.0625)}  and  similarly  y*  = (yMl.O)}, 

« n j n in 

y*  = (y*(0.25)},  y*  = {y* (0.0625) } . This  figure  clearly  indicates  that 
z n J n 

for  the  one-armed  bandit  problem  the  approximations  provided  by  y*,  y* 
and  y*  are  exceptionally  accurate,  these  curves  being  indistinguishable 
from  one  another. 


Figure  B 

As  pointed  out  in  the  introduction,  for  applications  of  the  solution 


of  the  one-armed  bandit  problem,  the  (z,t)  scale  where  z - y/s  and 
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t = 1/s  is  more  appropriate.  In  order  to  obtain  an  accurate  approximation 
to  the  optimal  stopping  boundary  in  the  (z,t)  scale  in  as  efficient  a 
manner  as  possible,  the  computations  wer^  carried  out  as  follows: 

Beginning  with  a very  small  value  of  A,  the  boundary  was  approximated  in 
a small  interval  of  s in  the  manner  described  above.  Successively 
larger  values  of  A were  then  employed  to  approximate  the  boundary  in 
successively  larger  intervals  of  s . These  approximations  to  the  optimal 
boundary,  determined  in  overlapping  intervals  of  s , were  then  superimposed 
to  obtain  the  final  approximation  to  the  optimal  boundary.  Since  the 
values  of  A used  were  chosen  in  such  a way  as  to  yield  the  desired 
accuracy,  only  the  value  c = 0 was  used  in  these  computations.  The 
computations  were  carried  out  using  both  the  direct  and  the  truncation 
method.  The  truncation  method  reduced  the  computation  time  required  by  a 
factor  of  approximately  two.  The  resulting  approximation  to  the  optimal 
stopping  boundary  is  illustrated  in  Figure  C together  with  the  asymptotic 
expansions  of  Chernoff  and  Ray  (1965) . Here  zQ  and  z^  denote  the 
boundaries  obtained  using  the  asymptotic  expansions  for  t close  to  0 
(s  large)  and  t close  to  1 (s  close  to  1)  respectively  and  z denotes 
the  boundary  obtained  by  means  of  the  computations  described  above. 


Figure  C 


4.  DISCUSSION 

As  indicated  in  the  introduction,  the  one-armed  bandit  problMr  has 
arisen  in  a number  of  statistical  applications  and  consequently  considerable 
effort  has  been  devoted  to  obtaining  approximations  to  the  optimal  stopping 
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boundary.  Chernoff  and  Ray  (1965)  first  formulated  the  problem  and 
were  able  to  derive  asymptotic  expansions  for  this  boundary.  In  a later 
paper,  Chernoff  (1967)  presents  a small  table  of  this  boundary.  Although 
it  is  not  indicated  in  the  paper,  this  approximation  was  obtained  by 
approximating  the  Wiener  process  by  a sum  of  independent  normal  random 
variables  and  applying  a backward  induction  to  the  approximating  process 
(private  communication  from  Herman  Chernoff) . In  carrying  out  this 
backward  induction  the  normal  distribution  was  approximated  by  a discrete 
distribution  thus  allowing  the  integrations  involved  in  each  stage  of  the 
backward  induction  algorithm  to  be  replaced  by  summations.  Mallik 
(1971)  presents  a more  detailed  table  of  this  boundary  and  indicates 


without  clarification  that  a refined  version  of  the  technique  used  by 
Chernoff  was  used  to  obtain  his  table.  Another  detailed  table  appears 
in  Chernoff  (1971  and  1972)  and  it  is  in  these  references  that  it  is 
first  suggested  that  the  Y'(s')  process  of  Section  2 be  used  to  obtain 
an  approximation  to  the  optimal  boundary  for  the  one-armed  bandit  problem. 

The  purpose  of  the  present  paper  was  to  describe  explicitly  how 

this  approximation  could  be  carried  out  and  to  demonstrate  that  by  the 

use  of  the  "correction  for  continuity"  given  in  (2.9)  this  approximation 

could  be  made  exceptionally  accurate  in  an  efficient  manner.  Obtaining 

the  present  approximation  to  the  boundary  of  the  one-armed  bandit  problem 

involved  ten  separate  runs,  the  i-th  run.  approximating  the  boundary  in  the 

region  s=l  to  s = 1 + 1550* using  a grid  determined  by  c = 0 and 
-4  i-2 

A^  = 10  *4  . The  entire  computation,  approximating  the  boundary  in  the 

region  1 < s < 10,000  required  just  36  seconds  of  computation  time  on 
the  IBM  370/168  at  UBC.  The  objective  in  the  present  computation  was 


k 
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to  obtain  an  accurate  approximation  in  the  (z,t)  scale.  Detailed 
examination  of  the  computer  output  leads  to  the  empirical  estimate  that 
in  the  (z,t)  scale,  for  each  fixed  value  of  t , the  boundary  has  been 
located  to  within  an  error  of  0.001. 

For  the  one-armed  bandit  problem  it  is  convenient  to  think  of 

a = y/s1^2  = y*/s*1/2  (4.1) 

as  the  number  of  standard  deviations  that  the  current  estimate  of  p 
is  away  from  0 , and  of 

B = #(a)  (4.2) 

($  denotes  the  standard  normal  cumulative)  as  the  significance  of  the 
data  for  rejecting  the  hypothesis  p = 0 in  favor  of  the  alternative 
that  p < 0 . Note  that  t = 1/s  is  the  fraction  of  the  total 
available  information  already  collected.  Then  the  optimal  procedure 
can  be  regarded  as  stopping  as  soon  as  the  hypothesis  p = 0 is  rejected 
in  favor  of  the  alternative  p < 0 at  a nominal  significance  level. 

B(t)  = <Mo(t))  = $(y(s)/s1/2)  (4.3) 

which  varies  with  t . In  variations  of  the  one-armed  bandit  problem 
in  which  the  data  is  not  normally  distributed,  it  seems  reasonable  to 
use  this  nominal  significance  level  as  a stopping  criterion.  In  order 
to  facilitate  future  use  of  the  results  of  this  paper,  the  optimal 
boundaries  z(t) , a(t)  and  (j(t)  are  presented  in  Table  1. 


Table  1 
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As  indicated  in  the  introduction,  these  same  methods  could  be 
applied  with  equal  facility  to  any  optimal  stopping  problem  of  the 
general  form  described  there.  In  Petkau  (1977)  these  methods  have 
been  employed  to  obtain  the  optimal  continuation  region  for  a stopping 
problem  in  which  the  optimal  continuation  region  can  be  described  as  the 
set  { (y,s) : y^ (s ) < y < y2(s),  s > l)  where  y^s)  j y2(s). 

The  connection  between  such  optimal  stopping  problems  and  free  boundary 
problems  involving  the  heat  equation  of  the  form  (1.4)  makes  it  clear  that  these 
same  methods  could  be  used  to  determine  numerical  solutions  of  such  free 

boundary  problems.  The  problem  of  obtaining  numerical  solutions  to 
free  boundary  problems  has  received  considerable  attention  in  the 
literature  (see,  for  example,  Sackett  (1971)  and  Meyer  (1977)).  Whether 
the  methods  proposed  here  provide  a reasonable  alternative  to  these  more 
general  methods  is  a question  that  remains  to  be  answered. 


! 
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Table  1.  One-Armed  Bandit  Boundary 
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Figure  A Van  tocrbckc' s Problem 


Figure  C Onc-^rr.cd  Bandit  Problem 
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